NSF PAR Search | NSF Public Access Repository

The untold impact of learning approaches on software fault-proneness predictions: an analysis of temporal aspects

https://doi.org/10.1007/s10664-024-10454-8

Ahmad, Mohammad Jamil; Goseva-Popstojanova, Katerina; Lutz, Robyn R (July 2024, Empirical Software Engineering)

This paper aims to improve software fault-proneness prediction by investigating the unex- plored effects on classification performance of the temporal decisions made by practitioners and researchers regarding (i) the interval for which they will collect longitudinal features (soft- ware metrics data), and (ii) the interval for which they will predict software bugs (the target variable). We call these specifics of the data used for training and of the target variable being predicted the learning approach, and explore the impact of the two most common learning approaches on the performance of software fault-proneness prediction, both within a single release of a software product and across releases. The paper presents empirical results from a study based on data extracted from 64 releases of twelve open-source projects. Results show that the learning approach has a substantial, and typically unacknowledged, impact on classi- fication performance. Specifically, we show that one learning approach leads to significantly better performance than the other, both within-release and across-releases. Furthermore, this paper uncovers that, for within-release predictions, the difference in classification perfor- mance is due to different levels of class imbalance in the two learning approaches. Our findings show that improved specification of the learning approach is essential to under- standing and explaining the performance of fault-proneness prediction models, as well as to avoiding misleading comparisons among them. The paper concludes with some practical recommendations and research directions based on our findings toward improved software fault-proneness prediction. This preprint has not undergone any post-submission improvements or corrections. The Version of Record of this article is published in Empirical Software Engineering, and is available online at https://doi.org/10.1007/s10664-024-10454-8.

Full Text Available

Search for: All records